Out of Set Language Modelling in Hierarchical Language Identification

نویسندگان

  • Saad Irtza
  • Vidhyasaharan Sethu
  • Sarith Fernando
  • Eliathamby Ambikairajah
  • Haizhou Li
چکیده

This paper proposes a novel approach to the open set language identification task by introducing out of set (OOS) language modelling in a Hierarchical Language Identification (HLID) framework. Most recent language identification systems make use of data sources from other than target languages to model OOS languages. The proposed approach does not require such data to model OOS languages, instead it only uses data from target languages. Additionally, a diverse language selection method is incorporated to further improve OOS language modelling. This work also proposes the use of a new training data selection method to develop compact models in a hierarchical framework. Experiments are conducted on the recent NIST LRE 2015 data set. The overall results show relative improvements of 32.9% and 30.1% in terms of Cavg with and without the diverse language selection method respectively over the corresponding baseline systems, when using the proposed hierarchical OOS modelling.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Language-identification based on cross-language acoustic models and optimised information combination

decoding, the second transforms the parameters from This work is concerned with the subject of languagethe decoding module and classifies the language. identification (LID). Two central issues are addressed. The common acoustic signal preprocessor calculates The first is to analyse the trade-off between detailed 12 RASTA filtered MFCC’s, their first derivatives and acoustic modelling and robust...

متن کامل

مقایسه روش های طیفی برای شناسایی زبان گفتاری

Identifying spoken language automatically is to identify a language from the speech signal. Language identification systems can be divided into two categories, spectral-based methods and phonetic-based methods. In the former, short-time characteristics of speech spectrum are extracted as a multi-dimensional vector. The statistical model of these features is then obtained for each language. The ...

متن کامل

Substitution as a Device of Grammatical Cohesion in English Contexts

The present study set out to investigate the effect of teaching substitution as a kind of grammatical cohesion on the true identification of confusing substitution elements with cohesive or non-cohesive roles in different contexts and also the production of modal, reporting and conditional contexts through clausal substitution acquaintance. To this end, the following procedures were taken. Firs...

متن کامل

SeerNet@INLI-FIRE-2017: Hierarchical Ensemble for Indian Native Language Identification

Native Language Identification has played an important role in forensics primarily for author profiling and identification. In this work, we discuss our approach to the shared task of Indian Language Identification. The task is primarily to identify the native language of the writer from the given XML file which contains a set of Facebook comments in the English language. We propose a hierarchi...

متن کامل

Textuality: The ‘form’ to Be Focused on in SLA

Due to the special (procedural) nature of the language (verbal communication) ‘knowledge’, the dominant trends in applied linguistics research in the last few decades have been advocating ‘acquisition’ rather than ‘learning’ activities where the main focus in SL & FL education should be on ‘meaning’ while some ‘focus-on-form’ being justified. But the ‘form’ to be ‘focused-on’ is mostly misconce...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016